library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("ggmap")
## Warning: package 'ggmap' was built under R version 3.3.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.3.2
library("lubridate")
## Warning: package 'lubridate' was built under R version 3.3.2
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library("magrittr")
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:ggmap':
##
## inset
library("tidyr")
## Warning: package 'tidyr' was built under R version 3.3.2
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
##
## extract
library("tibble")
r <- 6378137/1000
p <- 2*pi*r / 360
a <- cos(39*pi/180)
Latitude starts at 0 at the equator, goes up to 180 degrees at the North Pole, and down to -180 degrees at the South Pole. Longitude stats at 0 at the Greenwich Meridian, goes up to 180 as you head west and down to -180 as you head east. The radius of the earth is roughly 6378.137. Multiply this by 2 pi and divide by 360 to get 111.3 kilometers per degree of latitude. Longitude is a bit trickier. At the equator, a degree of longitude is the same as a degree of latitude. But things shrink as you move towards the poles. You have to adjust for this, and the adjustment factor equals the cosine of the latitude. So, for example, Kansas City is roughly at 39 degrees of latitude, so the adjustment factor is 0.777, making a degree of longitude only 86.5 kilometers.
My runs are short enough that you can ignore the curvature of the earth, but it can become an issue at longer distances. Consider the state of Colorado. It spans latitudes 37 degrees N to 41 degrees N, and longitudes 102 degrees 3 minutes W to 109 degrees 3 minutes W. At 37 degrees of latitude, seven degrees of longitude is 622.3 kilometers, but higher up at 41 degrees of latitude it is only 588.1 kilometers.
Anyone who travels great distances can also take advantage of the curvature of the earth. Look at the map of the shortest route from San Francisco to Naples.
http://www.gcmap.com/mapui?P=SFO-NAP
These two cities are about at the same latitude, so you would think that the shortest path would be across the middle of the United States and across Spain and Portugal in Europe. But if you head through Canada, Greenland, and Northern Europe you get there a lot faster. You cross many of your longitude degrees closer to the North Pole where the degrees are much closer together. This more than compensates for swinging northeast at the start and then curving back to the southeast halfway through the trip.
I won’t bother with the earth’s curvature in any of my calculations.
If you really wanted to get things right though, you have to adjust for the fact that the earth is not a perfect sphere (it is slightly flatter at the poles than at the equator). The R bloggers site has a nice summary of the more accurate formulas at
https://www.r-bloggers.com/great-circle-distance-calculations-in-r/.
Another post at the R bloggers site mentions functions for calculating distances from longitude and latitude values using one of two R libraries: Imap and geosphere. See
http://nagraj.net/notes/calculating-geographic-distance-with-r/
One last thing: I usually get this wrong, but when you draw a plot, longitude goes on the x-axis and latitude goes on the y-axis. I am writing it down here so I won’t forget it again.
I run a couple of miles every second or third day and participate in a few timed races (mostly 5K and 4 mile events) from time to time. I record my runs using an iPhone app, MotionX-GPS. It produces an xml file that includes geographic positions and time throughout the run. Although MotionX-GPS produces nice plots of my runs, I wanted to produce something a bit more detailed using R. Here’s an example using the data from a 5K race on January 1, 2017.
The file Track 502.gpx should be available at my github site.
"Track 502.gpx" %>% read.delim(header=FALSE, as.is=TRUE, sep="~") -> gpx_lines
The “interesting” lines in this file look something like
<trkpt lat=38.8788925 lon=-94.6658353>,
<ele>295.312</ele>, or
<time>2017-01-01T16:02:04.212Z</time>
There are other lines in the file which you can ignore safely. The first step is to pull out the relevant pieces of information amid all the xml code. I use regular expressions to do this.
path_distance <- function(lon, lat) {
n <- length(lon)
m1 <- as.matrix(cbind(lon, lat))
m2 <- m1[c(1,1:(n-1)), ]
bearing(m1, m2)
distCosine(m1, m2)
}
gpx <- NULL
gpx_lines$V1 %>%
set_names("tim") %>%
grep("<time>", ., value=TRUE) %>%
sub(".+T", "", .) %>%
sub("Z</time>", "", .) %>%
hms %>%
diff.POSIXt %>%
as.numeric %>%
cumsum %>%
round %>%
append(0, .) %>%
tibble %>%
set_colnames("tim") %>%
bind_cols(gpx) -> gpx
## Note: method with signature 'Period#ANY' chosen for function '-',
## target signature 'Period#Period'.
## "ANY#Period" would also be valid
gpx_lines$V1 %>%
set_names("lat") %>%
grep("trkpt lat=", ., value=TRUE) %>%
sub("<trkpt lat=", "", .) %>%
sub(" lon=.+", "", .) %>%
as.numeric %>%
tibble %>%
set_colnames("lat") %>%
bind_cols(gpx) -> gpx
gpx_lines$V1 %>%
set_names("lon") %>%
grep("trkpt lat=", ., value=TRUE) %>%
sub(".+ lon=", "", .) %>%
sub(">", "", .) %>%
as.numeric %>%
tibble %>%
set_colnames("lon") %>%
bind_cols(gpx) -> gpx
adj <- cos(gpx$lat[1]*pi/180)
gpx$xkm <- 1000*append(0, diff(gpx$lon)*p*adj)
gpx$ykm <- 1000*append(0, diff(gpx$lat)*p)
gpx$dst <- cumsum(sqrt(gpx$xkm^2+gpx$ykm^2))
head(gpx)
## # A tibble: 6 × 6
## lon lat tim xkm ykm dst
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -94.66584 38.87889 0 0.0000000 0.000000 0.000000
## 2 -94.66583 38.87888 2 0.4766266 -1.603001 1.672359
## 3 -94.66579 38.87878 11 3.2063970 -10.853650 12.989722
## 4 -94.66574 38.87869 17 4.6882723 -10.230261 24.243083
## 5 -94.66568 38.87859 21 5.2255605 -10.797991 36.239044
## 6 -94.66562 38.87850 25 4.7575998 -10.686671 47.936894
# names(gpx) <- c("lon", "lat", "tim", "xkm", "ykm", "dst")
tail(gpx)
## # A tibble: 6 × 6
## lon lat tim xkm ykm dst
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -94.66570 38.87842 2341 0.0693275 11.699678 5056.212
## 2 -94.66570 38.87851 2346 -0.3033078 10.642143 5066.858
## 3 -94.66572 38.87861 2350 -1.1872335 11.321192 5078.241
## 4 -94.66576 38.87871 2355 -3.5530345 11.310060 5090.096
## 5 -94.66580 38.87882 2360 -3.4403773 11.688547 5102.281
## 6 -94.66584 38.87890 2367 -4.1163204 8.582733 5111.800
The ggmap library has a “one stop shopping” function, qmplot (similar to qplot in ggplot2) that will automate the process of displaying geographic co-ordinates on a graph.
qmplot(lon, lat, data=gpx)
## Using zoom = 17...
## Map from URL : http://tile.stamen.com/toner-lite/17/31068/50147.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31069/50147.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31070/50147.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31071/50147.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31072/50147.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31068/50148.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31069/50148.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31070/50148.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31071/50148.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31072/50148.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31068/50149.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31069/50149.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31070/50149.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31071/50149.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31072/50149.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31068/50150.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31069/50150.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31070/50150.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31071/50150.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31072/50150.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31068/50151.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31069/50151.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31070/50151.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31071/50151.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31072/50151.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31068/50152.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31069/50152.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31070/50152.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31071/50152.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31072/50152.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31068/50153.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31069/50153.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31070/50153.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31071/50153.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31072/50153.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31068/50154.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31069/50154.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31070/50154.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31071/50154.png
## Map from URL : http://tile.stamen.com/toner-lite/17/31072/50154.png
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead
You will see some additional examples using this data a bit later.
The city government of Kansas City, Missouri has an open data initiative and one of their more interesting data sets is information from their 311 line. The 311 line is a phone number you can dial locally that will allow you to contact the city government to register a complaint or concern.
Please download the 311 center center service requests file from
https://data.kcmo.org/browse?category=311
fn <- "311_Call_Center_Service_Requests.csv"
cc <- read.csv(file=fn, header=TRUE, as.is=TRUE)
names(cc) %<>% tolower
str(cc)
This file does not need any serious data manipulation. Here’s a quick default plot of the phyiscal location of 100 of the phone calls.
cc_sub <- cc[1:100,]
qmplot(longitude, latitude, data=cc_sub)
You can get and store a map that covers the longitude and latitudue values in the GPX file. This allows you a bit more flexibility in how you plot data on the map.
The bounding box is a range that helps insure that every longitude and latitude value falls inside the map.
bb <- make_bbox(lon, lat, data=gpx)
print(bb)
## left bottom right top
## -94.66828 38.86852 -94.65759 38.88440
The zoom value is a number between 1 and 22. A large number (like 3) would show an entire continent. A medium number (like 10) would show an entire city. A small number (like 17) would show a few city blocks.
"2401 Gillham Road Kansas City MO 64108" %>%
geocode %>%
get_map(zoom=17) %>%
ggmap
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=2401%20Gillham%20Road%20Kansas%20City%20MO%2064108&sensor=false
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=39.083761,-94.576988&zoom=17&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
"2401 Gillham Road Kansas City MO 64108" %>%
geocode %>%
get_map(zoom=10) %>%
ggmap
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=2401%20Gillham%20Road%20Kansas%20City%20MO%2064108&sensor=false
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=39.083761,-94.576988&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
"2401 Gillham Road Kansas City MO 64108" %>%
geocode %>%
get_map(zoom=4) %>%
ggmap
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=2401%20Gillham%20Road%20Kansas%20City%20MO%2064108&sensor=false
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=39.083761,-94.576988&zoom=4&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
You can ask for a value of zoom that is just the right size for your bounding box, though, for reasons that elude me, I have needed to adjust the zoom level downward by two levels.
zm <- calc_zoom(bb, adjust=-2L)
print(zm)
## [1] 15
Once you have a bounding box and a zoom level, you can get a map that will cover all of your data values.
Note the stucture of the objects created. The get_map function creates, by default, a 1280 by 1280 matrix of character strings that represent rgb color values in hexadecimal. Note the attributes that are stored with this matrix.
The ggmap function creates an object that is mostly a bunch of functions for displaying things. This fits within the framework of the ggplot2 library. You can display the map by itself just by typing the name of the object, or you can elements to the map with functions like geom_point or geom_text.
mp <- get_map(bb, zoom=zm)
## Warning: bounding box given to google - spatial extent only approximate.
## converting bounding box to center/zoom specification. (experimental)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=38.876458,-94.662933&zoom=15&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
# str(mp)
im <- ggmap(mp)
gpx %>%
mutate(km=dst %/% 1000) %>%
distinct(km, .keep_all=TRUE) -> mark
im +
geom_path(aes(x=lon, y=lat), data=gpx) +
geom_label(aes(x=lon, y=lat, label=km), data=mark)
gpx %>%
mutate(t_minutes=tim %/% 60) %>%
distinct(t_minutes, .keep_all=TRUE) -> bars
speed <- data.frame(
kph=(3600/1000) * diff(bars$dst) / diff(bars$tim),
t=1:max(bars$t_minutes))
ggplot(data=speed, aes(x=t, y=kph)) +
geom_col()